Phrase-Based SMT for Finnish with More Data, Better Models and Alternative Alignment and Translation Tools
نویسندگان
چکیده
This paper summarises the contributions of the teams at the University of Helsinki, Uppsala University and the University of Turku to the news translation tasks for translating from and to Finnish. Our models address the problem of treating morphology and data coverage in various ways. We introduce a new efficient tool for word alignment and discuss factorisations, gappy language models and reinflection techniques for generating proper Finnish output. The results demonstrate once again that training data is the most effective way to increase translation performance.
منابع مشابه
A Feature-rich Supervised Word Alignment Model for Phrase-based Statistical Machine Translation
Word alignment plays an important role in statistical machine translation (SMT) systems. The output of word alignment can be used to build a phrase table, which is the core model in the decoding of new sentences. Most current SMT systems use GIZA++, a generative model, to automatically align words from sentence-aligned parallel corpora. GIZA++ works well when large sentence-aligned corpora are ...
متن کاملEnglish-Latvian SMT: knowledge or data?
In cases when phrase-based statistical machine translation (SMT) is applied to languages with rather free word order and rich morphology, translated texts often are not fluent due to misused inflectional forms and wrong word order between phrases or even inside the phrase. One of possible solutions how to improve translation quality is to apply factored models. The paper presents work on Englis...
متن کاملInner-Outer Bracket Models for Word Alignment using Hidden Blocks
Most statistical translation systems are based on phrase translation pairs, or “blocks”, which are obtained mainly from word alignment. We use blocks to infer better word alignment and improved word alignment which, in turn, leads to better inference of blocks. We propose two new probabilistic models based on the innerouter segmentations and use EM algorithms for estimating the models’ paramete...
متن کاملHybrid Example-Based SMT: the Best of Both Worlds?
(Way and Gough, 2005) provide an indepth comparison of their Example-Based Machine Translation (EBMT) system with a Statistical Machine Translation (SMT) system constructed from freely available tools. According to a wide variety of automatic evaluation metrics, they demonstrated that their EBMT system outperformed the SMT system by a factor of two to one. Nevertheless, they did not test their ...
متن کاملCombining Morpheme-based Machine Translation with Post-processing Morpheme Prediction
This paper extends the training and tuning regime for phrase-based statistical machine translation to obtain fluent translations into morphologically complex languages (we build an English to Finnish translation system). Our methods use unsupervised morphology induction. Unlike previous work we focus on morphologically productive phrase pairs – our decoder can combine morphemes across phrase bo...
متن کامل